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Abstract — It is well-known that cross-layer scheduling which 
adapts power, rate and user allocation can achieve significant gain 
on system capacity. However, conventional cross-layer designs 
all require channel state information at the base station (CSIT) 
which is difficult to obtain in practice. In this paper, we focus on 
cross-layer resource optimization based on ACK/NAK feedback 
Hows in OFDM systems without explicit CSIT. While the problem 
can be modeled as Markov Decision Process (MDP), brute force 
approach by policy iteration or value iteration cannot lead to 
any viable solution. Thus, we derive a simple closed-form solu- 
tion for the MDP cross-layer problem, which is asymptotically 
optimal for sufficiently small target packet error rate (PER). The 
proposed solution also has low complexity and is suitable for 
realtime implementation. It is also shown to achieve significant 
performance gain compared with systems that do not utilize 
the ACK/NAK feedbacks for cross-layer designs or cross-layer 
systems that utilize very unreliable CSIT for adaptation with 
mismatch in CSIT error statistics. Asymptotic analysis is also 
provided to obtain useful design insights. 

Index Terms — ACK, Acknowledgement, Cross-Layer, Feed- 
back, Scheduling, Markov Decision Process, MDP, No CSI, Power 
Adaptation, Rate Adaptation 



I. Introduction 
A. Background and motivation 

Cross-layer scheduling has been shown to achieve a sig- 
nificant performance gain in wireless systems as a result 
of multiuser diversity gain. Most of the existing cross-layer 
designs heavily rely on either perfect CSIT [6] [13] [14] or 
imperfect [15] [18]/ delayed CSIT [7] [19]. 

1 ) Absence of Accurate CSIT and CSIT error statistics: 
Perfect CSIT is difficult to obtain in practice, especially in 
FDD systems in which explicit feedback is required. With 
imperfect CSIT \ systematic packet errors would result even 
if powerful error correction codes are applied. This is because 
given the imperfect CSIT, there is uncertainty on the instanta- 
neous mutual information at the base station and the scheduled 
data rate may exceed the instantaneous mutual information, 
leading to packet errors (channel outage) despite the use of 
powerful error correction coding. It has been shown [5] [20] 
that packet errors cause significant degradation in cross-layer 
performance. There are some works to take into account of 
the imperfect CSIT or limited CSIT feedback in cross-layer 
design. For example, in [16] [17], the authors studied the 

'There ai'e two meanings behind "imperfect CSIT" in the literature. The 
first meaning of imperfect CSIT refers to partial knowledge of CSIT such 
as Hmited feedback but the partial CSIT knowledge is received accurately 
(without eiTors) or timely (no delay). On the other hand, the second meaning 
of imperfect CSIT refers to inaccurate knowledge of CSIT (either with CSIT 
errors or outdatedness). In this paper, the term "imperfect CSIT" refers to the 
second meaning. 



cross-layer design with noiseless limited feedback. In [15] 
[19], the authors studied OFDMA cross layer design with 
outdated CSIT. However, in all these works, the CSIT obtained 
is either noiseless (or no delay) or the statistics of the CSIT 
errors is assumed to be known [7]. However, in practice, 
the knowledge of CSIT errors statistics such as CSIT error 
variance and CSIT delay is needed and this is not easy to 
obtain because it depends on the mobility of the users as well 
as the multipath profile. It is quite challenging to have a robust 
cross-layer scheduling solution without the knowledge of CSIT 
error variance. On the other hand, regardless of the CSIT, there 
are always ACK/NAK flows between the mobiles (MS) and 
the basestations (BS). A robust cross-layer scheduling should 
make the best use of the ACK/NAK information which is 
embedded in the protocol. ^ 

2) Accomodation of mobiles with different receiver capa- 
bility: Conventional cross-layer design that utilized CSIT to 
perform resource allocation is essentially an open-loop system 
because BS cannot determine if the packet is received correctly 
or not even with the knowledge of CSIT (due to decoding 
errors). In practice, the system may have heterogeneous mix 
of mobiles with different capabilities (e.g. some has turbo 
decoding capability while some only has simple detection 
capability). To accommodate the heterogeneous mixture of 
receiver capability in the resource allocation, the BS has 
to rely on ACK/NAK flows (because the ACK/NAK flows 
give information about whether a packet can be decoded 
successfully or not). This closed loop information cannot be 
obtained in CSIT-based scheduler 

3} Heuristic Approach in existing literature: Recognizing 
the importance of utilizing the ACK/NAK in the resource 
allocation at the BS, there are existing works that discuss 
power control using ACK/NAK feedbacks. However, most 
of the works either considered power control on a wire- 
less link only as well as utilizing heuristic algorithms or 
study the performance by simulation. For example, a power 
adaptation design and performance study utilizing ACK/NAK 
feedbacks for point-to-point systems have appeared in [21]- 
[24]. Cross-layer scheduling utilizing ACK/NAK feedbacks 
was investigated in [9] [10] [25]. In particular, power control, 
rate adaptation and user scheduling for flat fading channels 
and frequency selective channels were carried out in [9] and 
[10] respectively whereas a rate adaptation scheme based 
on ACK/NAK feedbacks was proposed in [25]. The authors 
proposed a 2-level hierarchy stochastic scheduling algorithm 
based on learning automata (LA) for an AWGN channel by 

-in a similar way as outerloop power control in CDMA systems. 
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rate adaptation. Although the algorithm was shown to converge 
to the true channel state values, the convergence is not proven 
to maximize the throughput which is of usual practical con- 
cern. Moreover, in all these works [9] [10] [25], the algorithm 
designs are based on heuristic solutions and it is not clear what 
the best possible performance from the ACK/NAK information 
is. Furthermore, the suboptimal solutions obtained have high 
complexity and is not suitable for real-time implementations. 
Moreover, in all these existing designs, there is no mechanism 
to control the per-user packet error rate PER to a given target 
level. Yet, being able to control the PER of the wireless 
sessions per user is very important from the requirements of 
applications (e.g. voice and video codec). 

Motivated by all the reasons above, we propose a robust 
closed-loop cross-layer design for OFDM systems where no 
explicit CSIT knowledge is needed at the base station. The 
cross-layer power allocation, user assigrmient as well as rate 
allocation are adaptive to the built-in 1-bit ACK/NAK feed- 
backs [1] [2] [3] from the selected users. Being built in at the 
link layer of most wireless systems and hence, the ACK/NAK 
feedbacks add no incremental cost to the proposed closed- 
loop design. Moreover, since the cross-layer solution is driven 
by the ACK/NAK feedbacks, it introduces robustness on the 
cross-layer performance with respect to uncertainty at the 
CSIT and propagation parameters. These robustness cannot be 
obtained by utilizing explicit limited CSIT feedback. However, 
there are several challenges in solving the problem: 



B. Technical Challenges 

1 ) Issues of packet errors: Conventional cross-layer op- 
timization only consider sum ergodic capacity as the opti- 
mization objection. Ergodic capacity only considers the b/s/Hz 
transmitted by the BS regardless of packet errors. As a result, 
ergodic capacity is a reasonable performance metric only when 
the packet error is negligible (which is the case with perfect 
CSIT and very strong coding). However, in our case without 
CSIT, there is always systematic packet errors (due to channel 
outage) and this cannot be alleviated by just using strong 
coding. To accommodate packet errors, we have to use system 
goodput (b/s/Hz successfully received by the mobiles) as our 
performance metric. Note that goodput reduces to ergodic 
capacity in the case of no errors but in general, to deal with 
goodput, we need to deal with the cdf of mutual information 
(rather than the first order moment only) and this impose some 
technique challenges to the problem. 

2) Issue of the MDP complexity: While the problem be- 
longs to MDP, it is well-known that there is usually no simple 
solution (even numerically) using standard value-iteration and 
policy-iteration solutions (see details in section II). For in- 
stance, the MDP belongs to the class of infinite state space 
and brute-force approach has exponential complexity in the 
number of time slots M and hence, they could not give useful 
solutions. Instead of brute-force solution, we exploit some 
special structure of the OFDM and obtained a low complexity 
closed-form solution, which is asymptotically optimal for 
sufficiently small PER target. 



3) Asymptotic Performance: As pointed out, all existing 
solutions are heuristic in nature and studied performance 
purely by simulations. This is because of the challenging 
nature of the problem. In this paper, we shall derive some 
asymptotic properties on the system performance so as to 
obtain some design insights. 

C. Summary of Contributions 

We consider the downUnk of a wireless system with a 
base station and K mobile users over frequency selective 
fading channels (OFDM). The base station shall adapt the 
downlink rate, power and user selection in an OFDM system 
based on the ACK/NAK feedbacks from the mobiles. To take 
into account of potential packet errors due to channel outage, 
we consider an average system goodput which measures the 
number of bits successfully transmitted as our performance 
measure. The robust cross-layer design is modelled as a 
Markov Decision Process (MDP) [4] [35] [36] [37] with 
power, rate and user selection policies as the optimization 
variables so as to optimize the average system goodput while 
maintaining a target PER. It is well-known that MDP-based 
problems [26] [27] always require complex value iteration 
algorithms. However, in this paper, we shall derive a simple 
closed-form solution for the MDP cross-layer problem which 
is asymptotically optimal for sufficiently small target PER. 
The proposed solution has low complexity and is suitable for 
realtime implementation. It is also shown to achieve significant 
performance gain compared with systems that do not utilize 
the ACK/NAK feedbacks for cross-layer designs or cross- 
layer systems that utilize very unreUable CSIT for adaptation 
with mismatch in CSIT error statistics. Furthermore, since the 
ACK/NAK feedbacks are generated by the mobiles based on 
CRC checking after packet detection, the proposed closed- 
loop cross-layer scheme is very flexible in the sense that it 
can automatically accommodate mobiles with different receive 
sensitivities in the RF or variations in the baseband estimation 
and decoding algorithms. Hence, the proposed scheme achieve 
significant goodput gain with built-in robustness against chan- 
nel fluctuations as well as variations across the capabiUties of 
different mobile receivers. 

II. A Review on Markov Decision Process 

MDP has found applications in ecology, economics and 
communications engineering since 1950 [28]. MDP is a 
modeling tool which describes a sequential decision making 
process. It is used to make the optimal sequence of decisions 
where outcomes of the problem are partly random and partly 
depend on such decisions. The advantage of MDP is that it 
provides a systematic framework for analysis of optimality, 
existence, dynamics and convergence of solutions. 

A complete description of a MDP problem involves a deci- 
sion epoch, a state space, a control policy, a state transition 
kernel as well as a reward function. The time line is first 
divided into decision epochs in which the controller makes 
decisions on control actions and the system receives rewards 
at the decision epochs. Specifically, at the m-th decision epoch, 
the system occupies a state S S where S denotes the state 
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space. Based on the observation on the causal state sequence 
si, Sm, the controller takes a control action am € A where 
A is the set of actions. A control policy n is defined to be the 
set of actions for all possible state sequence. Based on the ac- 
tion am and the current state Sm, the system receives a reward 
R{s,n,a„i) and moves to the next state Sm+i according to 
the state transition probability kernel P{sm, ctm, Sm+i)- The 
optimization problem is to find the optimal control policy so as 
to maximize the total rewards: argmaxTr X]m=i R{sm,oim)- 
As a result, a MDP problem can be characterized by the tuple 
(§, A, P(., ., .)), R{., .)). One reason why the MDP problem is 
difficult is due to the huge dimensions of the variable, namely 
the entire policy space tt. As a result, a key step in solving the 
MDP is known as divide-and-conquer. Specifically, instead of 
optimizing for the entire problem, it can shown that the MDP 
can be solved by optimization of actions on a per-stage 
basis. 

There has been a lot of in-depth analysis of MDP [28] 
[29] and different branches of the problem. Different analysis 
are needed for finite state space problems v.s. infinite state 
space problems; finite horizon problems v.s. infinite horizon 
problems; unconstrained MDP v.s. constrained MDP etc. By 
constrained MDP, we mean that the problem has one or more 
constraints on the feasible policy space tt. Constrained MDP 
problems are closely related to communication problems [29] 
such as power and rate control problems with an average delay 
constraint [30]; scheduling problems involving routing in ad- 
hoc networks [32] or handoff problems [31]. For example, in 
[31], the authors optimized the occurrence of path optimiza- 
tions for inter-switch handoffs in wireless ATM networks. The 
expected total cost per call, including the switching/ handoff 
cost and signaUng costs, is modeled as a infinite-horizon 
semi-Markov decision process [33] with discount rate. This 
expected total cost is the objective function to be minimized. 
At each decision epoch, the decision maker can choose to 
do path optimization or not which is modeled in the action 
set. Using divide-and-conquer principle, the MDP problem can 
be solved using value iteration algorithm or poUcy iteration 
algorithm [34]. The model is then extended to have QoS 
constraints. 

This paper is outlined as follows. The channel model is 
firstly presented in section HI. In section IV, the problem 
formulation is given as a cross-layer optimization problem 
and a MDP problem. The conventional solutions of MDP is 
provided at the end of section IV. The proposed solution, 
which is asymptotically optimal, is presented in section V. 
Simulation results are analyzed in the section VII. Section 
VIII presents the conclusion. 

III. Channel Model 

We consider a downlink cross-layer scheduling problem in 
a frequency selective, block fading (in frequency) and quasi 
static (in time) channel. The bandwidth is divided into D 
frequency blocks. The fading gain in each frequency block 
is flat. With the use of OFDM, the fading of each frequency 
block is independent to other frequency blocks. Also, in the 
time domain, we assume that the channel remains quasi-static 



Frequency 



d=4 














d=3 














d=2 






N/D 










d=l 




) 


s 

N/D 











D=4 f. 



a time slot 



T Time 



Fig. I. 

domain, 

JV 



Tlie channel model is represented graphically. In the frequency 
assume D = 4 frequency blocks within subcarriers, there are 
^ subcarriers in each frequency block and have the same frequency gains. 
In the time domain, channel remains unchanged within T seconds: a time slot. 



M packets are transmitted in a time slot. Each packet consume ^ 
a packet slot. 



seconds: 



for a period of time T seconds and we call this a time slot. 
Thus, the fading gains on each frequency block remain the 
same throughout a time slot. Within a time slot, we send M 
packets which occupy the same amount of time, a packet slot, 
seconds. From now on, the names packet slot and slot are 
used interchangeably. With frequency block fading, there are 
N frequency sub-carriers in which [^J frequency sub-carriers 
having the same fading gains form a block and there are D 
blocks in total. The fading gain represented by each frequency 
block is assumed to be independent of the other blocks. The 
model is summarized in figure 1. 

Denote the number of users in the systems by K. Each user 
k sees a vector channel hk = [ft^fe,i, • • • , hk^n] where hkj is 
the channel power of frequency block j of user k. Stacking 
all vector channels, we have a channel power matrix H. 



H 



/ hi \ 



\hK J 



( 

h2,i 



h2,D 



h 



K,D 



J 



(1) 

Note that each entry hk,d is exponentially distributed with unit 
mean and variance. Denote the ACK/NAK feedback from each 
user k during packet slot m by Vk.m - Then, 



1, ACK is received from user k in slot m; 
0, NAK is received from user k in slot m. 



(2) 

where ACK is received when the packet m is successfully 
decoded and NAK is received when the packet m has error. 

The closed-loop cross-layer scheduler is as shown in figure 
2. There are three optimization parameters, namely the user 
selection am , power level Pm and rate • The parameters are 
determined for each packet m. At the receiver side, each user k 
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Fig. 2. Closed Loop Cross-Layer Scheduler The user, power and rate 
optimization at the BSs is solely based on the 1-bit feedbacksfrom MSs. 



would decode the packet and send a 1-bit ACK/NAK feedback 
Vk.m to the transmitter. In m-th packet slot, the maximum 
achievable rate in bits is 



m 1 ^a. 



J 



DM ^ 



l0g2(l 



N 



(3) 



Vk.: 



(4) 



where noise power is normalized to be one. 

Now, we can rewrite equation (2) mathematically, 

0, r„i > c{pm, ha„J- 

In high SNR environment, the maximum bits per packet slot 
in equation (3) can be approximated by 

c{Pm, ha J = JJJ^ 2^ log2(l + ) 



rf=l 



N 



NT 

highSNR DM 



(l0fe(^))+l0&(X<,„) 



(5) 



where Xk ~ Y[d=i hk,d- This approximation significantly 
reduces the complexity of the system as the D-dimensional 
channel power gain vector is replaced by a scaler In figure 3, 
we show the difference between the maximum bits per packet 
slot and its approximation in (5). The approximation error is 
less than 2% when the SNR is around lOdB. 

Define the cumulative density function (CDF) of the random 
variable Xk to be 



= Pr{Xk < x) 



(6) 



which can be computed offline. Note that Xk is unknown to 
the transmitter which updates the set of all possible values of 
Xk in each packet slot m by the feedback Vk,m- The set of 
all possible values Xk, based on information received through 
feedbacks before packet slot m, is 

_ j ^k,ra(MXk ■■ c{pm,Xk) >rra} , Wfc,™ = l; 
X/c,m n {^k ■ c{pm, Xk) < Tm} , Vk,m, = 0. 

(7) 



X 



- Mutual information 

- High SNR approximation in (5) 



15 20 
Signal power in dB, (noise power=1) 



Fig. 3. Rate difference between mutual information and its approximation 
in (5). The difference is less than 2% in common operating region, between 
10 to 30 dB. 



For example, at packet slot 1, to = 1, the set of real channel 
power gains for X^ i is all real numbers R+. A pair of power 
and rate (pi,ri) is selected. A packet is broadcasted with 
power pi and rate ri. At the end of packet slot 1, ACK/NAK 
feedbacks Vk^i for all users k are received. Xfc.2,Vfc are then 
updated using (7). At the end of packet slot 2, Xfe.3 are updated 
accordingly and so on. Note that the set Xfc .,„, as described in 
(7), would solely depend on the causal power allocation, rate 
allocation and ACK/NAK feedbacks from the users. 



IV. Problem Formulation 



This section is targeted to reveal the mathematical descrip- 
tion of the optimization problem. The problem is best ex- 
plained by first writing down the optimization variables which 
are the power, rate and user selection policies defined in the 
following. We would then provide the mathematical expression 
of the system goodput which is the optimization objective 
in this paper A problem statement and its corresponding 
mathematical representation are provided. A subsection is 
given here to explain the transformation of the optimization 
problem to a MDP problem. 



A. Problem formulation as a cross-layer optimization problem 

For simplicity, denote the causal user assignments, rate 
sequence and power sequence from slots 1 to to — 1 by 
Am = (ai,a2, . . . ,a„i_i), i?,„ = (ri, r2, . . . , r,„_i) and 
Pm = {pi,P2, ■ ■ ■ ,Pm-i) respectively. Also, denote the causal 
ACK/NAK feedbacks for slots 1 to to — 1 from all users by 
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the matrix Vm 



Note that the packet error rate can be simpUfied as follows. 



\ Vk,1 Vk,2 

( \ 



VK,r, 



(8) 



Definition 1 (Power Allocation Policy): A power allocation 
poUcy 



M 

E 

m=l 



Pv 



Po 



(9) 



is defined as the set of all power allocation at the m-th packet 
slot where m G [1,M]. The subscript notation {■)vm denotes 
that the power allocation at the m-th packet slot is a function 
of the ACK/NAK feedbacks up to the (m — l)-th packet slot 
Vm- The power allocation policy V is restricted by the total 
power constraint Pq. 

Similarly, we define the rate allocation policy and user selec- 
tion poUcy. 

Definition 2 (Rate Allocation Policy): A rate allocation 
poUcy 

n={{rm)v^:rmeR+} (10) 

is defined as the set of all rate allocation at m-th packet 
slot where m e [1,M] and M+ is the set of all positive 
real numbers. The policy is determined by causal ACK/NAK 
feedbacks up to slots m — 1. 

Definition 3 (User Selection Policy): A user selection pol- 
icy 

A^{{arn)v„r-ame{l,...,K}} (11) 

is defined as the set of all user selection at m-th packet slot 
where m e [1,M]. The policy is determined by the causal 
ACK/NAK feedback sequences up to slots m — 1. The user 
selection at m-th packet slot am denotes the index of user 
selected. 

Let the feedback of user at packet slot m, in time slot z 
be Va^,m{z)- The number of packet errors in time slot z equals 
to the sum of packet errors of the M packets sent within time 
slot z: Y^m=i(^ ^ Va,^.m{z))- The total number of packet 
errors in Z time slots is J2z^i Sm=i(l ~ Vam,m{z)). Thus, 
the packet error rate averaged over time slots is 



= lim 

z^oo MZ 



Z M 



(12) 



2—1 m— 1 

As the channel gain remains quasi-static within a time slot and 
is independent of that in other time slot, the averaged packet 
error rate can be written as the expectation of number of packet 
errors within a time slot over channel realizations. (We drop the 
notation of time slot z) 



1 ^ 

^e = E,ME(l 
m=l 



(13) 

where Ejj(.) denotes expectation over the random variable H. 



Pe = Pr{c{pm,Xa^) < r„ 



(14) 



The average system goodput G (averaged over ergodic samples 
of time slots) is given by: 



M 



M 

E 

m— 1 



Pr{c{pm,Xa^) > rm)rm- (15) 



In most wireless systems, a target packet error rate (PER) 
is assigned due to various appUcation requirements. Let e be 
that PER. For example, the PER, e, is of the order of 10"^ 
for voice applications. The relation between X^^^ and e (5) is 
given by 

1-e = Pr{c{pm,Xa^)>rm\^arr„m) 

) (16) 



where 



On 



N 



(17) 



To conclude, the cross-layer optimization problem can be 
formulated as 

Problem 1 (Cross-layer formulation): Determine the opti- 
mal power allocation pohcy V, rate allocation policy TZ and 
user assignment A so as to maximize the average system 
goodput G{V,TZ,A) subject to the target PER requirement 
1 — e = Pr {Xa^ > 0m\^a^,m) and the total power constraint 

The optimization problem above is difficult to solve due to 
the huge dimension of variables involved. Yet, we shall illus- 
trate below that the total system goodput G can be expressed 
recursively and hence, the problem above can be expressed 
as a Markov Decision Problem. Define Fm{Pm, W^m-i) to be 
the maximized goodput sum from slot m to M (from packet 
slot m to the last packet slot) subject to power constraint Pm 
and causal power allocations, rate allocations and feedbacks 
from users i.e. 



Fm{Pm,Wm-i)= max Eh 




(18) 



where Wm_i = {Vm-i, Am-i,Qm-i = [Oi, ■ ■ ■ ,Om-i)) and 
Pm denotes the vector of power allocation from pm to pM- 
Similar notations apply to and a^. The maximization is 
subject to the PER requirement Pr{Xa^ > Om\^a,n,m) = 
1 — e and the total power constraint X^i^lm-Pi — Pm- We first 
have the following lemma about Fm{Pm, Wm-i)- 

Lemma 1: Fm{Pm, Wm-i) can be espressed recursively as 

FmiPm,Wm-l) (19) 
= max {(1 - e)rm + Ey„ [Fm+l{Pm - Pm, Wm)] } - 
Proof: See subsection IX-A in appendix. ■ 
Note that the maximization variables are Pm^'Tm^'^m^ the 
power, rate and user selection in packet slot m, instead of 
the selections from slot m tiU the last slot. As a result, 
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this facilitate the divide-and-conquer approach to the original 
optimization problem in (1). 
From (15), the maximized system goodput is 

M 

G*{V,Tl,A)= max VEHK„,„}r„. (20) 

m— 1 

By definition of in equation (18), the optimized goodput 
is 

G*{V,Tl,A) = F,{Po,Wo) (21) 

M 

subject to pm < -Po 

m=l 

Pr{Xa^ > 0m\Xa^ 6 X„^,„) = 1 - e 

\PmJ 

where Wq is a empty set. As a result, the optimized system 
goopdut G*iP,n,A) = Fi(Po, Wo) can be obtained recur- 
sively from equation (19). We shall eleborate the recursive 
solution in the following sections. 

B. Problem Formulation as a Markov Decision Process 

As explained in Section 11, a MDP problem is character- 
ized by the tuple {T,S,A,P{s,a,s'),R{s,a)). In our case, 
the decision epochs of the base station T = {1,2, ...,M} 
corresponds to the scheduling slots. In the following, we shall 
discuss the association of our cross-layer optimization problem 
with the MDP tuple, namely the state space §, action space A, 
state transition kernel as well as the per-stage reward function. 
Based on that, we shall formally recast the problem into an 
MDP. 

• State Space Association With 6™ = [9i, . . . , 6.^], define 

m^v^) and L(^Qffi^v^) to be the upper bound and 
lower bound of CSI which is some information gathered 
by the ACK/NAK feedbacks and 9^ in equation (17). 
The state space, S, is a collection of the following vectors 
s. 

S = (L(e„, V^), U{Qm. ^n^, Pra. Rm, ^ACK)^^NAK)^ 

(22) 

where Pm is the remaining power; i?„, is the sumrate 
from slot m to M, s^^^^^ and are the pointers to 

the states if A£K:Vm = 1 and NAK: Vm = ^ respectively. 
The CSI can take all possible real values and therefore 
make the state space S infinite. However, as illustrated in 
an example in the following subsection, the decision tree 
built by state transitions in our problem is a lot smaller 
in size. 

• Action Space and Policy Association The action taken 

at each state .s consists of the selection of power Pm, 
transmission rate, r^, and the user selection, am- The 
set of possible actions A at every state s is independent 
of decision epoch m and it is given by: 

A = Ks^m = {(Pm, Tm, Am) £ (23) 

{p e M+ : p < Po} X M+ X {1, . . . , K]] . 



State Transition Kernel Association The transition 
probability P(s, a, s') is a real value function which maps 
{§ X A X S} to [0, 1]. In our case, the probabihty of 
going from state s to state s' by action a G A is time 
invariant. 

In each decision epoch, m, a selection of actions, am, 
takes place, meaning that the base station selects the 
power Pm and the transmission rate to user am- 
After every user k receives the packet, each of them 
would decode the packet header and transmit a 1-bit 
feedback to base station, . This 1-bit feedback carries 
the information of ACK (1) or NAK (0). The transition 
probability captures the probability of such ACK (1) or 
NAK (0) and would take the system to a different state. 
For instance, the current state is denoted by s; the state 
after receiving ACK s°; the state after receiving NAK 
s". The probability of receiving ACK is Va and that of 
NAK is 1 — Va- The action taken is a. We have 

P(s,a,s") = Pa; (24) 
P(s,a,s") = l-P„. (25) 

And 

^P(s,a,s') = l (26) 

s'es 

The state transition probability is described in equation 
(31) 

in which 6' is the third element in s' and 6 is the third 
element in s. The upper and lower bound of CSI would be 
modified according to the ACK/NAK feedbacks received. 
After updating the bounds, the probabihty of ACK, which 
is equal to the probability of the event that the channel 
power Xk lies between the lower bound and state 9', has 
to equal 1 — e, as dictated by the error constraint. Evaluate 
the probability, we have equation (32). 
Per-stage Reward To decide which actions in A should 
be carried out, we would need a decision rule dm- The 
decision rule dm is a history-dependent function. Define 
the history 5m to be a vector of past states, actions and 
feedbacks. 

5m = [si-,ai,. . . ,Sm-l,am-l,Sm] (27) 

The recursive relation is therefore 

^m = [^m-l,(^m-l,Sm]- (28) 

Denote the set of all histories by A^- Note that 

Ai = S (29) 
A2 = § X A X § 

Am = SxAx---x§ 
= A„_i X A X S 

The history dependent rule dm maps A^ to A. 

A control poUcy is a plan specified by a sequence of 

decision rules. A control policy tt is 

-K = {di,d2,...,dM), diG Ai,i = l,...,M (30) 
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P{s, a, s') = P{6m+i = e'\6m = e,am = a) = < 



e, Pr{Xk > e'\Xk > L{Qm,v^),Xk < U{Qm,v]:)) = e 
1 - e, Pr{Xk > e'\Xk > L{em,v^),Xk < U{Qm,vJ^)) = 1 - e 
otherwise. 



P{s,a,s') = < 



e, <t>{6m+i = 6') = (1 - e)<^(C/(e„, v^)) + e<l>{L{Qm, v^)) 
1 - e, = e') = ecj>{U{@m, C)) + (1 - e)<^(i(e>m> O) 

Otherwise. 



P{Sm-,a,Sm+l) = < 



cj){e^+i) = (1 - e)</)([/(e„„ 5^)) + e</'(L(e„, v^)) 

Otherwise. 



(31) 
(32) 

(33) 



The per- stage reward function is 



C. A State Transition Example 



if Vm = 0; 



(34) 

where Sm+i,a denotes the state at slot m + 1 if is 
reached at slot m and action is taken. 

Problem 2 (The MDP formulation): The MDP problem is 
defined as a maximization problem of the reward function, in 
our case, the system goodput Fi{Po, Wq). Thus, the problem 
statement is, with sUghtly abuse of notation 



To illustrate the state transition of a MDP, a state transition 
diagram assigned with an initial state is given in figure 4 by 
only drawing transition branches corresponding to the tuples 
of scheduled action and the corresponding non-zero transition 
probability. Note that this diagram only shows a fragment 
of the whole decision tree because there are more than one 
possible initial state. 

The decision tree has 0{\9m\)^ x 2^^ elements, where l^^l 
is the number of values 9^ can take. In other words. 



max < R{Sm, OCm) 



(35) 



\s\ = o{\em\f 



(36) 



There are \0m\ possible values of the lower bound L{Qm, vJP'). 



such that Vm = 1, . . . , M, s„, Sm+i € S, G A, G IR+ For example, L{Qrn, v]^) € {yi, 
and equation (33) is satisfied. 



} where yb < yb+i- 



(A,Pa(s°,s))^ 



(Ai,PAis",s 




Fig. 4. A state transition diagram example. With only 2 possible outcomes 
at each state (node), the state space (the number of nodes) increases expo- 
nentially, hence the problem size. 



For each value of lower bound j/b, there are |^^„i | — 6 — 1 values 
of U{Qm,v^) and 0m- Thus, the total number of possible 
states is 12 + 22 + . . . + = Oi\0m\^). 

With either positive or negative feedbacks, each state can 
only branch to 2 possible next states. Assume that we start on 
one of these states. The number of possible descendents would 
be equal to the sum of the series 1 + 2 + 2^ + + . . . + 2*^"^ 
which is 2^. Thus, the total number of nodes in the tree is 
O(|0™|)3 x2^. 
Denote the elements in the state space S by 

§ = [s, {s\ s% s°\s^\ s"}, . . . , }} (37) 

where qM-i denotes any possible binary sequence of length 
M — 1. The binary sequence represents the causal ACK or 
NAK feedbacks received. For example, state represents 
that 2 NAKs have been received and state s^°^ represents that 
the first and the third transmission are correct and the second 
transmission or guess is incorrect. The state s^' is at the i-th 
level of the tree which means the (j + l)-th packet transmission 
(with the root being the zeroth level). In the diagram, only 
transitions with non-zero probability are drawn. The transition 
probability corresponding to action Aq^ e A from state .s''* to 
state s[9»'0l, meaning that a NAK is received at (z+l)-th packet 
transmission, is denoted by the probabiUty P (s'* , A,. , s['*'°l) . 
At each state s'*, there are two possible transition branches 
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ACK : (38) 

NAK : (39) 
(^,„p(s«%^g„5[«-°l)) = ((ft+i,ri+i,ai+i),e). 

D. Conventional Solutions ofMDP 

A conventional solution to a MDP consists of backward 
and forward recursions. The backward recursions set up a 
huge searching tree/ table which would involves dynamic pro- 
gramming. In the forward recursions, the system states evolve 
through the tree. Here we adopted the Finite Horizon-Policy 
Evaluation Algorithm in [28] for the backward recursion. 

Algorithm 1 Conventional Finite Horizon- PoUcy Evaluation 
Algorithm 

1: Each node in the tree consists of following fields: 

2: Initiahzation: m ^ M, ML, U,pm, Om 

Fm{Pm,Sm) = max Pr{c{pM,OM) > tm^m 

dM(SM) 

3: if m = 1, stop. Otherwise, go to step 4. 

4: m ^ m - l,\/Srn,Pm,Pni,L,U 



Algorithm 2 Conventional Online State Evolution Algorithm 
1: Set m= 1 and start state 



S = {0,00, 9m, Po,Rm,S 



^ACK) 3(NAK) 



Evaluate P,;(P,„,(5„J 



max {P{Sm,a,Srn+l)rrn 



~\~P{Sm, Olj Sm+l)Fm+li^"^ _P'mi ^m\'Va^,m — 1) 
- P{Sm, a, S„+l))-F^+i(-Pm - Pm, <^m|Wo^,m = 0)} 

such that the constraints in equation (33) are satisfied and 

P{Sm, a, Sm+l) = 1 - e 

5: {Pm, dm, Tm) are given by obtained in step 4. 

6: Rm = F^{Pm,Sm) which is the accumulated rate of this 

node and its descendents. 
7: 8^'='^, s^^^ are computed in (33) 

After building up a table in backward recursion using 
algorithm 1, from m = M — > 1, we estabUshed a large 
binary tree with each node represents a particular estimate of 
channel power and each branch corresponds to an ACK/NAK 
feedback. Each path from the root to the leaves corresponds 
to a sequence of estimates and the corresponding feedbacks. 
In Online Evolution (algorithm 2), we read this tree from the 
root and traverse down to the leaves. Each packet is transmitted 
with parameters marked in that node and a new node is reached 
according to the ACK/NAK feedbacks. 

Note that the drawback of such algorithm is that the 
requirement of memory is huge as there are numerous possible 
states. In our problem, the state space is infinite. Even if we 
discretize the state space as an approximation, the complexity 
of the brute-force approach has exponential complexity in M 
and hence, could not give viable solutions. 

V. Proposed Solutions 

The MDP can be solved by a backward recursion followed 
by a forward recursion. In this section, we shall first elaborate 
the backward recursive solution, namely the Optimal State 
Evolution followed by the forward recursion, namely the 



where Rm is the maximum among the nodes with L = 

0,U = oo. 

2: If rn = M -|- 1, stop, otherwise go to step 3. 

3: Transmit packets as prescribed by decision rule dmi^m) 

computed in algorithm 1. 
4: Receive an ACK/NAK feedback Vk^m from each user k. 
5: Update the upper and lower bound of CSI. 

L = 0mif Va^,m = 1 
U = Omii Va^,m = 

6: Evolve to next state according to the bounds of CSI 

^ACK)^^NAK) jyj^ feedbacks Vk^m^k. 

7: m -I- 1 •*— m, go to step 2. 



Online Envolution. Unlike conventional solution for MDP, we 
proposed a simple closed-form solution which is asymptoti- 
cally optimal for sufficiently small PER. The proposed solution 
only has complexity 0{M), which is in big contrast with 
brute-force complexity 0{exp{M)). 

A. Optimal State Evolution 

We illustrate how to combine the target PER e, with the 
knowledge obtained from feedbacks to generate estimates of 
channel power 6m- Note that 6m in equation (17) is always 
either sup Xk,m or inf Xfe^^ as equation (7) can be rewritten 
as 

^ _ 1 ^k,m n {^k ■ Xk > 6m} , Wfe.m = 1; ^^q^ 
''"''"^ [ Xk^m n {-^k ■ Xk < 6m} , Vk,m = 0. 

The lower bound and upper bound of Xk.m+i are 

L{em,Vk) = max {6*, : Vk,z = 1, 1 < z < m} (41) 
C/(e„, t)^) = min {6i : Vk,i = 0,l<i<m}. (42) 

Combine (16) with the knowledge obtained from feedbacks: 

Pr{Xk > 6m+i\Xk > L{em,v]^),Xk < U{@m,v]^)) = 1-e 

(43) 

Rearranging the terms in equation (43), we have the dynamics 
of 6m 

Lemma 2: At each packet slot m, the estimate of channel 
power is computed by the causal feedbacks v^~^ and 
the lower and uppwer bound of Xa^ 

^{dm) = e(/.(f/(e^_i,z;™"')) + (1 - 

(44) 

where 4'{6m) is the cdf of (6). 

Proof: see section IX-B in appendix. ■ 

B. User Selection 

Evaluate the expectation in Fm{Pm, Wm-i) defined in (19), 
we obtain equation (45). Solving equation (45), a stochastic 
programming tree would be needed. Yet, as e is small for 
practice, the decision tree is reduced to equation (46). 

The complexity of the problem is reduced from exponential 
to linearity with m. 
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Fm{Pm,Wm-l) = maX {(1 - e)rm + (1 - e)-Pm+l(-Pm - Pm,Wm\Va^,m = 1) + tFm+l{Pm - Pm,Wm\Va^,m = 0)} 

(45) 

Fm{Pm, Wm-l) = maX {(1 - e)rm + (1 - e)Fm+l{Pm -Pm, Wm\Va^,m = 1)} • (46) 



( ^Pm NT ( (-pra \^ \ \ 

dm{Sm) = (Pm = ^ _ _ J^M-m+i ' ^"^ ^mi^°^'^ VV^J ^""j ' " argmaxL(e„-i, ^}™-^) j (47) 



Lemma 3: The optimal user selection strategy 

a™ = argmax L(0„_i,t;j^~^) 
fe 

of (46) is 

Proof: See subsection IX-C in appendix. 



C. Power Allocation 
Lemma 4: The power allocation pohcy 



(48) 



Pv 



1 - (1 - e)^-'»+i 

"vm— 1 



(49) 



, where Pm = Pq — X^I^i Pi the remaining power at time 
m, is an optimal pohcy with respect to optimization problem 
(46). 

Proof: See subsection IX-D in appendix. ■ 

D. Rate Allocation 

Given the causal feedback, power and rate information Wm 
and the channel estimate/state values 6m in (59) at each slot 
m, the rate allocation is computed by the following 



NT 

'dm 



(50) 



E. Online Evolution 



With new information, Ufe,m-i arrives in each slot m, 
we proceed on the decision tree according to the updated 
upper and lower bounds of CSI and the feedbacks. The 
set Xfe,„i is modified to contain only the possible values 
of the channel power gain based on the causal ACK/NAK 
feedbacks. Xfe,„ = {x : L(e™,t;j^) < a; < C/(e„,t;j^)} The 
transmission parameters according to the decision rule are in 
equation (47). 

User am is selected such that she contains the largest 
possible channel power gain. As proved before, the power 
allocation is static and solely depends on the total power and 
the target error probabihty constraint. The data rate is adapted 
according to channel estimate 9m and feedbacks Va^,m-i - The 
online scheduling policy is illustrated in figure 5. 

VI. Asymptotic Analysis 

This section is devoted to prove that the goodput achieved 
in a packet slot would be equal to the instantaneous mutual 
information of the slot as if they were perfect CSIT when the 
number of transmissions or number of packet transmissions 




Cross-Layer scheduler 



IJsrr SlJixi.iou: 

a™ = argmaXkL{QTii-\, ^^~^) 



= «"' («*.„. (^(9—1,8",:')) +(!-£)* (i(e„_i,ii7„-'))) 



Packet error dynamics 



ACK/NAK dynamics: 

f 1 c(p,„M>r„,, 
"''"-I c(p„.Ji..) < '■„.. 



Evolution of parameters 










. n {Xt 


- Xi 


> «„.} I't,r. = 1 




n n {Xi, 


■ Xi 


< «„.} ot,„. = 






Vk,t 


= 1,1 < z < m} 




min {Hi 


Vk,t 


= 0, 1 < i < m} 



Fig. 5. Structure and implementation of the proposed solution. 



tends to infinity. In other words, there is zero steady-state- 
error in the recursive solution. To prove such claim, we would 
need the following four theorems. 

Lemma 5: At packet slot m, the users selection set Km 
denotes the set of users who have the largest potential channel 
power gains. 

Km = {k: L{em,vf) > L{emv'ff),yk' ^Km} (51) 
The users selection set Km at slot m is a subset of Km_i. 



c 



^m— 1 



(52) 



The number of elements in Km is \Km\ which decreases with 
m. 

Proof: See subsection IX-E in appendix. ■ 
Lemma 6: For all users k in user selection set Km at each 
slot m, the channel power gains Xk have lower bounds and 
upper bounds L{Qm,v^) and U{Qm,v^)- 

Proof: See subsection IX-F in appendix. ■ 
Lemma 7: Define the gap between the upper and lower 
bounds of channel power gains to be Wm = U{idm,v^^) — 
L{Qm,v'^^). Wm monotonically decreases with m. 

Proof: See subsection IX-G in appendix. ■ 
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- Proposed Scheme 
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- Round Robin 
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Proposed Scheme 
Gooput upper bound { 
Round Robin 



20 22 
SNRdB 



Fig. 6. Average system goodput vs number of independent subbands with 
transmit SNR=30dB, Pq = 24W, K = 3, M = 30, PER = 0.05. 



Fig. 7. Average system goodput vs Average SNR with Pq = 2AW, K = 
3, D = 3, M = 30, PER = 0.05. The proposed solution has the same slope 
as the upper bound (with perfect CSIT). 



Lemma 8: When number of transmissions goes to infinity, 
the scheduled rate r,„ achieves capacity of the system in 
perfect CSIT case. In the other words, the scheduled rate 
is equal to the capacity achieved by selecting user which gives 
highest capacity and using perfect CSIT. Or mathematically, 

lim rm = lim loq-2 I f^^l Om]= c(pm,X„ ). 

m~*oo ™ rn^ooDM \\ N J J ^ ^ 

Proof: See subsection IX-H in appendix. ■ 

VII. Results and Discussions 

In this section, we would discuss the simulation results 
with the following simulation settings. The bandwidth of the 
systems is 20 MHz which is divided into 64 subcarriers 
(N=64). Throughout these subcarriers, there are D group of 
independent subbands. The time slot T — 0.1 sec and we 
compared our proposed solution with two baselines. Specifi- 
cally, in baseline 1, we assume the BS has perfect CSIT and 
performs standard power adaptation and hence, it serves as 
a goodput upper bound. In baseline 2, we consider round 
robin scheduling which does not utilize any CSIT information 
and hence, has very robust performance against CSIT errors. 
Note that the performance of baseline 1 is obtained under 
perfect CSIT assumption and therefore is not achievable. By 
comparing with baseline 1, we can guage how optimal the 
proposed solution could achieve. Similarly, by comparing with 
baseline 2 (which is a common approach in the absence of 
CSIT), we could guage the potential performance advantage 
that can be captured by utilizing the built-in ACK/NAK 
feedback flows. 

A. Effects of Number of Independent Subbands 

In figure 6, the sum of goodput in 30 packets transmitted is 
plotted against the number of independent subbands D with 
Pa = SNR = 30dB, if = 3 and target PER = 0.05. 

Note that our proposed solution achieved 85% and 91% of 
the performance upper bound (baseline 1) when D = I and 



5 respectively. Compared with baseline 2 (RR), the proposed 
solution achieved very significant 500 % goodput gain. This 
illustrated the importance of utilizing the 1-bit ACK/NAK 
flows in the resource allocation. 

Note that the goodput upper bound (baseline 1) decreases 
with D in figure 6 because the system did not take advantage 
of the frequency diversity as the selected user has to transmit 
on every frequency channels. When the number of independent 
channels increases, the capacity function, being concave in 
channel gains, decreases. 

B. Effects of Transmit SNR 

In figure 7, there are 3 users and each user has 3 independent 
channels. With transmission of 30 data packets in a time slot, 
the system goodput of the proposed solution achieves 60% and 
89% of the performance upper bound (baseline 1) in low and 
high SNR scenarios respectively. Compared with baseline 2 
(RR), the proposed solution has significant 400% gain in high 
SNR regime. 

C. Effects of Number of Users 

Figure 8 illustrates the system goodput vs number of users 
for £> = 3, M = 30, SNR = 30dB, Pq = 2AW. Similarly, 
the proposed scheme achieved 93 % and 85 % of the per- 
formance upperbound (baseline 1) with 1 user and 9 users 
respectively. Compared with baseline 2 (RR), the proposed 
scheme achieved 400% goodput gain. 

D. Effects of Target PER e 

Figure 9 illustrates the system goodput vs target PER for 
SNR = 'SOdB, Po = 24W^, K = 3, M = 30 and D = 3. 
We observe that when the target PER is low, the proposed 
solution will be more conservative in determining the transmit 
data rate in order to avoid packet errors due to channel outage. 
On the other hand, when the target PER is high, the proposed 
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e 

Proposed Scheme 

Gooput upper bound (baseline 1) 

Round Robin 



4 5 6 

Number of Users 



a 4 



- Proposed Scheme 

- Goodput upper bound (baseline 1 ) 

- Round robin 



20 25 30 35 40 

Maximum Doppler Frequency (Hz) 



Fig. 8. Average system goodput vs numer of users with transmit SNR=30dB, 
P = 24W, D = 3, M = 30, PER = 0.05.: Capacity increases witli number 
of users because of multi-user diversity, so as the proposed solution. 



Fig. 10. Average system goodput vs maximum Doppler frequency. The 
users have i.i.d. random speed (uniformly distributed from to 
thi-oughout the simulation. Pq = 24VK, K = 4, D = 3, M = 30, SNR = 
30dB, PER = 0.1 




2^ — 
0.01 



- Proposed Scheme 

- Gooput upper bound (baseline 1) 

- Round Robin 



0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 
Packet Error Rate 



Fig. 9. Average system goodput v.s. target PER with transmit SNR=30dB, 
K = 3, D = 3, M = 30: With small target PER (e.g. eiTors sensitive appli- 
cations), the proposed solution is conservative and acheive a less throughput. 
With high PER, the proposed solution may be over-optimistic on channel 
quality. In medium PER, the proposed solution gives the best peiformance. 




£^ 0, (E=0.05) 

^ 0^ (E=0.08) 

- Channel power product 
conventional converge curve 



10 15 20 

Packet Slots 



30 



Fig. 11. Value of channel gains estimate with Different PER targets 
in Different Packet Slots: The proposed solution maximizes goodput and 
therefore avoids over-estimating (resulting an NAK), hence the non-oscillating 
curve. A less target PER e, which is more conservative, may prolong the 
convergence speed. 



solution becomes more aggressive in transmitting data but the 
goodput will be limited by high channel outage probability. As 
a result, there is an optimal target PER, if one is interested to 
optimize the system goodput. Note that the performance upper 
bound of baseline 1 and the baseline 2 goodput performance 
is insensitive to the target PER. 

E. Effects of Mobility 

To study the robustness of the proposed scheme w.r.t. 
mobility, we assume the users have i.i.d. random speed (with 
Doppler frequency uniformly distributed from to fd.max)- 
Figure 10 illustrates the average system goodput vs fd,max 
with SNR = 30dB, Pq = 2AW, K ^ A and D ^ '3. Observe 
that the proposed solution is quite robust even up to moderate 



mobility of 50 Hz, which corresponds to 22.5 km/hr at 2GHz 
frequency. This robustness is due to the closed-loop feedback 
mechanism in the proposed solution. 

F. Dynamics of Strategies 

1 ) Tradeoff between Convergence Speed and Target PER: 
An example of the procedure of the algorithm is given in 
figures 11 to 14. 

Figure 11 plots the channel power gain estimate dm in a 
particular channel realization v.s. time epoch m. ACK's are 
received until m < 25 and to < 16 for the curves PER e = 0.5 
and 0.8 respectively. The upper bound of the is updated 
with NAK and 6a„^ converges to the true channel power gain 



IEEE TRANSACTIONS ON INFORMATION THEORY 



12 



— e — Power allocation {e=0.01) 

— V — Power allocation {e=0.05) 

— A — Power allocation (e=0.08) 

— * — Power allocation (e=0.1) 

Power allocation with Perfect CSIT 




10 15 

Packet Slots 




E=0.08 
— * — E=0.1 

- - Goodput upper bound (baseline 1) 



10 15 

Packet Slots 



20 



25 



Fig. 12. Scheduled Power with Different PER Targets in Different Packet 
Slots 



Fig. 13. Scheduled Data Rate with Different PER Targets in Different Packet 
Slots 



product. The convergence time is shorter with high PER. It is 
because large PER provides larger flexibility for estimation. 
Yet, the throughput yield from large PER may be lower than 
that of small PER. 

Moreover, conventional convergence curves would quickly 
climb close to the channel power gain product, overshoot, 
oscillate and then converge, as plotted in figure 1 1 . The con- 
vergence curve of our scheduling scheme would not oscillate 
because any additional overshoot would waste power, time and 
the potential data transmission. Thus, our scheduling scheme 
increases steadily, overshoots once and converges. 

2) Power Allocation Strategies for Different Outage Target: 
The power allocation of system with Pq = 2AW, K = 3, D = 
3, M = 30, SNR = 30dB, is plotted in figure 12. Note that 
the power allocation strategies depend on the target PER e. 
The objective is to maximize the goodput sum in all packet 
slots which can be separated into current goodput and future 
goodput as in equation (46). To maximize the goodput sum for 
large PER, more power should be allocated at the early slots 
to have as much successful transmission as possible . Notice 
that, as PER decreases, the power allocation converges to the 
power allocation for perfect CSIT, equal power allocation. It 
is because at the extreme case of zero PER, the probability 
of getting outage is zero, meaning that we have perfect CSIT 
(baseline 1). 

3) Rate Allocation Strategies for Different PER Target: 
Assume Pq = 2AW,SNR = 30dB,D = 3,K = 3, M = 
30. The rate allocation curves with different PER target are 
plotted in figure 13. Note that the area under the curve is the 
throughput. The data rate achieved by baseline 1 is plotted 
with a dotted line. Notice that the area achieved by small PER, 
0.01, is small and the area increases by increasing the PER. 
However, area decreases after PER 0.07 which is the optimal 
PER in the current system assumption. An over-conservative 
PER target would yield too little goodput as the is under 
estimated. An over-optimistic PER target would also yield a 
low goodput as outage occurs when X^^^ is over estimated. 
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Fig. 14. Acknowledgements from Different Users, top (user 1), second one 
from the top (user 2) and so on 



The allocated rate r,„ increases with the increment of 
knowledge of the channel power gain in figure 13. Then 
decreases after slot 10 because the scheduler has spent half of 
the total power in the first 10 slots. Less rate is resulted from 
smaller power remained for these 20 slots. 

4) Acknowledgements Reveal CSIT: In figure 14, the ac- 
knowledgements from user 1 (from the top) to user 4 (from 
the bottom) are plotted whereas 1 denotes positive acknowl- 
edgement (ACK) and denotes negative acknowledgement 
(NAK). After each transmission, each user decodes the packet 
header and feedback to transmitter. If a user k reports NAK at 
slot m, user k would have a channel power gain less than the 
channel power gain estimate at slot m, 9m- Thus, we know 
that 92 < Xi < 03, dii < X4 < <X3< 013. Since NAK 
are received at slot 25 and 26, we know that 024 < X2 < 027- 
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VIII. Conclusions 

In this paper, we considered the OFDM resource opti- 
mization problem based on ACK/NAK feedbacks from the 
mobiles without explicit CSIT at the base station. We derive a 
simple closed-form solution for the MDP cross-layer problem 
which is asymptotically optimal for sufficiently small target 
PER. The proposed solution also has low complexity and 
is suitable for realtime implementation. Simulation results 
revealed that the system goodput performance of the proposed 
solution achieved 89% of the performance upper bound (per- 
fect CSIT performance) and has over 400% gain compared 
to round robin scheduUng. Due to the built in closed-loop 
feedback mechanism, the proposed scheme is shown to have 
robust performance against CSIT errors and different mobility. 
Asymptotic analysis is also provided to obtain useful design 
insights. 

IX. Appendix 
A. Recursive Property of Goodput 

Recall from equation (19). Expectation over the channel 
power H is the same as the iterative expectation E^^Ejj^y^ 
where Vm is the feedbacks from users from slot m to M. 
Recall Vm, defined in (8) is the causal feedbacks from slot 
1 to m — 1. Combining Vm and Vm gives the whole history: 

{Vm,Vm)=VM. 



Pr St 

Evaluating the expectation yields 



(53) 



M 



Fm{Pm,Wm-i)= Tuax Ey \ Pr{c{pi, X^,) > ri)ri 

\i=m 

(54) 

Separate the instantaneous goodput at slot m from the 
goodput sum from slot m + 1 to M. Take an iterative 
expectation and obtain equation (63). 

Since the first term does not depend on Vm nor Vm, it 
simplifies to (64). 

Note that the second term is the expectation of Fm+i {Pm — 
Pm, Wm) over Vm according to equation (54). Equation (65) 
can be obtained. 



B. Dynamics of 6m 

Denote the event Xk > L{Qm-i,v]^~^) by C and Xk < 
U{Qm-i. v]^^^) by U respectively. Employ the theorem of 
conditional probability on equation (43). 

— p^) — = ^^^^ 

Recall the cdf of Xk, cj), in (6), (55) can be rewritten as 

<i>{U{@m-UVr'))-4>{^m) ^ ^ _ 

cl>{Uiem-UV^-')) - mQm-UV^-')) 

Rearranging the terms and equation (43) can be obtained. 



C. Optimal User Selection 

This section is to prove that the user selection «,„ = 

argmaxL(em_i, v'^'^^) maximizes Fm{Pm, W^m-i) in (46). 
fe ^ 

Substitute 9m = (^) 2^^w^ to W™-i) and we 

obtain equation (66). 
Further expand (66), we obtatin (67) 

As we assume VmT--,VM = 1, we have 6m = 
L{Qm,v^^^i) and therefore 

6m+i = {e^iUiQm, C„+ J) + (1 - e)^{Om)) • (57) 

As (j){9m) is the CDF of dm, 4>{Sm) is monotonic increasing 
with 6m, so as (f)^^. Thus, dm+i increases with dm- According 
to equation (67), Fm{Pm,Wm-i) increases with 9m- What 
remains to prove is that Um = argmax L{Qm-i,'Vk'~^) 

k 

maximizes 6m- We prove by contradiction. Let k* ^ am, 

we have L{Qm-i,v'j^~^) < L{Qm-i,v^~^) by definition, 
and U{Qm-i,v]^,-') < L{Qm-i,v^-^) < U{Qm-i,v^-^) 
by characteristics. Denote 6m by ^{k) where k is the user 
selection in slot to. According to equation (57), "^{k) < 
^'(am) V/c ^ am- Therefore, am = argmax L{Qm-i,v^~^) 

k 

maximizes Um and therefore Fm{Pm, Wm-i)- 

D. Optimal Power selection 

At the base case, we would like to maximize the goodput 
in the last slot M which is to solve 

fI^\Pm,Wm-i) = max (1 - e)rM. (58) 

PM,rM 

And given Wm-i, 6m can be solved by taking an inverse of 
the function 0am(-) in equation (44) 

\6m = (e<^(f/(e„-i,Cj')) + (1 - e)<l){L{@m-i,vZ:'))) 

(59) 

J^j 2 NT , 

the optimal solution at the base case is 

{Pm = Pm 
r-M = ^log,(#)+^log,(M ^^^^ 

Therefore, can be solely expressed by 6m and Pm- Recur- 
sively develop Fi{Po), we have 

F^^\Po, Wo) = max {(1 - e)n +••• + (!- e)*VM} 

(61) 

With some mathematic manupulation, we obtain equation (68). 

As we have assumed Vm = ^, 6m can be computed for 
TO = 1 to M. Note that Pm+i, • • • , ^'m are of the form 



Pm+l = ai{Pm-Pm) 

Pm+2 = a2{Pm - Pm - Pm+l) 



(62) 



Pm = aM-m(l - flM-m-l) • • • (1 - ai)(Pm -Pto) 

Therefore, the closed form of optimal power allocation is 
obtained. Note that the objective function in (68) is concave in 
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Fm{Pm,Wm-i) = max 

Pm,rm,aT> 



I i=m+l J 

i^m(-Pm, W'm-i) = max \Pr{c{prn,Xa^> rrn)\Wrn-\)rrn + E^^lEy . V Pr(c(pi, X„, ) > | Wi_i)ri 



(63) 



i— m+1 



(64) 



F^{Pm,Wm-l)= max {Pr{c{pm,Xa^) > rm\Wm-l)rm+ E^^Fm+l{Pm- Pm.Wm)] ■ (65) 
Pm,rm,am 

-Fm(-Pm,W^m-l) = | (1 " e) loga ( (^) ^r») + (1 " e)i^m+l (^m - Pm, W„ K = 1) | (66) 



,aM 



i^„(P„, = max <i (1 - e)^ log2 { 0^) + ■ ■ ■ + {I - ef'-^+'^log, ( (^)'' um 



NT 
'DM 



DM '^^\ \ N J 



F(^\P^, W^.,) = max - e) [lo& (^) + • . . + (1 - e)^"™ log, 

A/^T r 

-^(1 - e) [\og^{e^) + • • . + (1 - ef-'"^ log, (^m) 



(67) 
(68) 



Pm- Substitute equation (62) to Fm\Pm, Wm-i) in equation 
(68) and differentiate it and set it to zero. We obtain 

P"* ~ 1 _ (1 _ g^M-m+l 

which is solely depending on e and Pm but nothing else. 
The solutions obtained here is a lower bound of the original 
solution as the objective is solving the problem in only one 
direction which assumes all positive feedbacks and correspond 
to the all positive routes in the decision tree. 

E. Shrinking User Selection Set Km 

Before proving this lemma, we need to introduce two prop- 
erties of the lower bound of channel power X^, L{&m, w™)- 

1 ) Monotonic Increasing Lower Bound of Real Channel 
Power: 

Lemma 9: The lower bound of the channel power gains 
L{<dm,Vk^) increases monotonically with m . 
Proof: 

i(e^,0 (70) 

= max {6i : Vk,i = m} 

_ j max{6'm, {6*1 : Wfe,i = 1, 1 < i < m - 1}} if Wfe,„ = 1, 
1 max {Si : Vk,i = 1, 1 < i < m — 1} if Vk,m = 0. 

_ J max{6'm,I/(6m-l,V^~^)} if Vk,m = i, 

\L{em-i,v]^-^) ift;fc,^=0. 

■ 

2) Lower Bound of Channel Power of Selected User Larger 
than the upper bound of channel power of the Remaining 
Users: 

Lemma 10: Assume 3k ^ K^-i. 



U{Qm-i,v^-^) < i(e„_i, W G Km-l (71) 
Proof: Assume 3k ^ K^-i. Recall equation (41), 

?7(e™_i,w™-i) = mm{ei : Vk,i = 0, 1 < « < m - 1} 

There exist a packet slot q, 1 < q <m — 1, such that Vk,q = 
and Vk'^q = 1, which can be described mathematically in 
equation (77). 

From definition, 6q > L(6g_i,w^r^) and L(6g,u^,) = 
max|^g,i(6g_i,i;^r^)|wfe',g = l|. Thus, L(6,,i;^,) = 6q 
if Vk',q = 1. Thus, continuing from equation (77) 

U{em-i,v'^-^) (72) 
= min [L{e„ vl,), U{e,_,, vr'), C/(e„_i, 

< L{em-i,v]^-') 

The last inequahty is proved by lermna 9. ■ 
We are going to prove this lemma by contradiction. Assume 

3k e Km and k ^ Km-i- At slot m, Vfc' G Km-i,k' ^ 
Km, by lemma 9, the lower bound of channel power gain is 
monotonically increasing with m. 

i(e™,i)^) >L(e„_i,c[r^) (73) 

Also, by lemma 10, all users outside the user selection set 
have upper bound less than or equal to that of users inside the 
user selection set. VA: ^ K^-i, A:' S K^-i 

f/(e„_i, v^-') < L{em-i, v"^-') (74) 

Because k e K^, k' ^ Km, we have 

L{em,vr)>L{em,v]:i). (75) 
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^7(e„^_l,^;™-l) = mm{0g,{9i:vk,i=O,l<i<q-'i-,q+'i-<i<m-l}} (76) 



Thus, we have 

Liem,vT) ill) 

> L{em,v'P) (VfcGK„,fc'^K„) 

> L(e„_i,t;™-^) (by lemma 7) 

> [/(e„_i,i;™-i) {W eKm-i,k^Km-i) 

which leads to a contradiction. Thus, VA; e K^, A; G K^-i. 

F. Channel Estimate of Selected User between Upper and 
Lower Bound 

We are going to prove this claim by mathematical induction. 
In the base case, m = 0, before any transmission, we have 
initialization 

L = (78) 
U = oo (79) 
Xk e [L,U] VfcGKo (80) 

where Ko = {l,...,ii'}. 

Assume the statement is true for m = q. We obtain 

Xk e [LiOg, vl), UiOg, vl)] , Vfc e Kg (81) 

When m = q + 1, before the {q + l)-th transmission, 

i(e„ vl) < e^+i < UiQg, vl), Mk e K, (82) 

After (5+ l)-th transmission, there are two cases, either ACK 
or NAK. If an ACK is received then we have 



or 



Tq+l < c{Pg+l,Xk) 
NT ((Pq+l\''n 



(83) 



DM 



or 6q+i < Xk- 
The updates of the bounds are 

L{eg+,,vl+') = max{L(e„t;«),e,+i} (84) 
= ^g+l 

and [/(e,+i,i)^+') = Uieg,vl). (85) 
Thus, we have Vk € Kg f]{k : Vk,g+i = 1} 



D 



L{Qg+^y+^)<Xu<U{Qg+,,vl-'') 



<^+l^ 



(86) 



Let Kq_|_i = Kg Pi {A: : t;fc,g+i = 1} which completes the 
proof. Similarly, if NAK is received, Xk < ^g+i- The updates 
of the bounds are 



= ^g+l 



(87) 
(88) 



Thus, we have Vfc e P| {fc : v^^q+i = 0} 

L{Qg+i,vl+^) <Xk< Uieg+i,vl+^). (89) 

Let Kg_|_i = Kqf\{k : f ^,9+1 = 0} which completes the 
proof. 



G. Monotonic Decreasing Gap between Upper and Lower 

Bounds 



The difference of the gaps at slot m and m — 1 is 



(90) 



= {c/(e„„<j-L(e™,c„)} 

- {c/(e„_i,z;™-\) - i(e™-i,^^:r-j} 
fL(e„_i,«™-\)-i(0m,«r„) if^L^i' 

\u{Q^,vZ) - U{Q^.^,vZ:_\) if vZ = 0- 
('L(e„_i,z;™-\)-0„ ifs;i = l, 

< OVm 

The last inequality is due to the fact that L{Qrn-i, ^^™~_\) < 

Om<UiQm-l,vZ;_\) 

H. Scheduled Rate Achieves Capacity 

By lemma 5, when m ^ 00, the user selection set 
degenerates to a single user whose has the largest lower bound 
of the channel power gains, — k where > 
L{Qm,v^) and k ^ k'. Using lemma 6 and 7, we have 

m^oo, L{9m,vZ) = U{6m,vZ) = (91) 
Thus, we have 

m — > 00, = am = k, where Xk > Xk', k ^ k' (92) 
Also by lemma 7, we have 

m^OO,0m = L{em, VTJ = U{9m, V^) = (93) 

Thus, we have the scheduled rate at slot m, 

NT 



lim Tm = lim 

m— >oo m— s-oo UjVi 



(94) 



= c{p^,Xk) 

where user k has the largest channel power gains. The quantity 
c{pm, Xk) is the capacity achieved by the system with perfect 
CSIT. 
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